Machine Learning

Machine Learning is a field of computer science that gives the computers the ability to learn without being explicitly programmed. - Arthur Samuel (1959)

Machine Learning is an approach or subset of AI, with emphasis on "learning" rather than just computer programming. Here a machine uses complex algorithms to analyse a massive amounts of data, recognize the patterns among the data, and make a prediction - without requiring a person to program specific instructions into the machine's software. The systems pattern recognition improves over time as it learns from it's mistakes and corrects itself, just like a human would.

Machine Learning models are able to learn from the data without the explicit help of a human. That is the main difference between machine learning models and classical algorithms. Classical algorithms are told how to find the best answer in a complex system and the algorithm then searches for best solutions and often works faster and more efficiently than a human. However the bottleneck here is that the humans has to first come up with the best solution. In machine learning, the model is not told the best solution and instead is given several examples of the problem and is told, figure out the best solution.fair

Unlike hard-coding a software program with specific instructions to complete a task, Machine Learning allows a system to learn to recognize patterns on its own and make predictions.

Artificial Intelligence vs Machine Learning

Artificial Intelligence is intelligence displayed by machines in contrast with the natural intelligence displayed by humans.

Artificial intelligence is a broader concept than machine learning, which addresses the use of computers to mimic the cognitive functions of humans.

When machines carry out tasks based on algorithms in an “intelligent” manner, that is AI. Machine learning is a subset of AI and focuses on the ability of machines to receive a set of data and learn for themselves, changing algorithms as they learn more about the information they are processing.

You can think of deep learning, machine learning and artificial intelligence as a set of Russian dolls nested within each other, beginning with the smallest and working out. Deep learning is a subset of machine learning, which is a subset of AI.

In short, machine learning and deep learning are categorized under AI, but AI is n't necessarily machine learning or deep learning.

Machine Learning vs Deep Learning

Deep Learning a subset of Machine Learning takes computer intelligence even further. It uses massive amounts of data and computing power to simulate Deep Neural Networks. Essentially, these networks imitate the human brain's connectivity, classifying data sets and finding correlations between them. With its new found knowledge which is acquired without human intervention the machine can then apply its insights to other data sets. The more data the machine has at its disposal, the more accurate its predictions will be.

Deep Learning can be expensive and requires massive datasets to train iteself on. That's because there are huge number of parameters that need to be understood by a learning algorithm which can initially produce lot of false-positives. For instance, a deep learning algorithm could be instrucuted to "learn" what a cat looks like. It would take massive data set of images for it to understand very minor details that distinguish a cat from, say a cheetha or a panther or a fox.

Deep Learning was inspired by the structure and function of the brain, namely the interconnection of many neurons. Artificial Neural Networks (ANN) are algorithms that mimic the biological structure of the brain. In ANNs, there are "neurons" which have discrete layers and connections to other "neurons". Each layer picks out a specific feature to learn such as curves / edges in image recognition. It's this layering that gives deep learning its name, depth is created by using multiple layers as opposed to a single layer.

So what contributed the emergence of Machine Learning? Basically there were two factors:

The realisation that Arthur Samuel had in 1959, that there existed a possibility to teach computers to learn for themselves, rather than teaching them everything from the start that they needed to know.
The second factor came to be with the emergence of the internet, which created a huge database of digital information, ready for analysis.

Algorithms

An algorithm is a set of rules to be followed when solving problems. In machine learning, algorithms take in data and perform calculations to find an answer. The calculations can be very simple or they can be more on the complex side.

Algorithms need to be trained to learn how to classify and process information. The efficiency and accuracy of the algorithm are dependent on how well the algorithm was trained. Using an algorithm to calculate something does not automatically mean machine learning or AI was being used. All squares are rectangles, but not all rectangles are squares.

Unfortunately, today, we often see the machine learning and AI buzzwords being thrown around to indicate that an algorithm was used to analyze data and make a prediction. Using an algorithm to predict an outcome of an event is not machine learning. Using the outcome of your prediction to improve future predictions is.

Types of Machine Learning

There are three main types of Machine Learning.

Supervised Learning - With Labeled Data trying to predict a label based on known features.
Unsupervised Learning - Unlabeled Data and are trying to group together similar features based on features
Reinforcement Learning - Algorithm learns to perform an action from experience.

Supervised Learning

Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known.

Simply put, supervised learning finds associations betweeen features of a dataset and a target variable. For example, supervised learning models might try to find the association between a person's health features (heart rate, obesity level and so on) and that person's risk of having a heart attack(the target variable).

These associations allow supervised models to make predictions based on past examples. This is often the first thing that comes to people's minds when they hear the phrase, machine learning, but it is in no way does it emcompass the realm of machine learning. Supervised Machine Learning models are often called predictive analytics models, named for their ability to predict the future based on the past.

Supervised Learning requires a certain type of data called labeled data. This means that we must teach our model by giving it historical examples that are labeled with the correct answer.

Specifically, supervised learning works using parts of the data to predict another part. First we must separate data into two parts as follows:

The predictors, which are the columns that will be used to make our prediction. These are sometimes called features, inputs, variables and independent variables.
The response, which is the column that we wish to predict. This is sometimes called outcome, label, target and dependent variable.

Supervised Learning attempts to find a relationship between the predictors and the response in order to make a prediction. The idea is that in the future a data observation will present itself and we will only know the predictors. The model will then have to use the predictors to make an accurate prediction of the response value.

Most experts estimate that approximately 70 percent of machine learning is supervised learning.

Supervise Learning - Example 1 - Heart Attack Prediction

Suppose we wish to predict if someone will have a heart attack within a year. To predict this, we are given that person's cholestrol, blood pressure, height, their smoking habits and perhaps more. From this data, we must ascertain the likelihood of a heart attack. Suppose, to make this prediction, we look at the previous patients and their medical history. As these are previous patients we know not only their predictors(cholestrol, blood pressure, and so on), but we also know if they actually had a heart attack (because it already happened!).

This is a supervised machine learning problem because we are:

Making a prediction about someone
Using historical training data to find relationships between medical variables and heart attacks.

The hope here is that a patient will walk in tomorrow and our model will be able to identify whether or not the patient is at risk for a heart attack based on her / his conditions (just like a doctor would).

As the model sees more and more labeled data, it adjusts itself in order to match the correct labels given to us. We can use different metrics to pinpoint exactly how well our supervised machine learning model is doing and how it can better adjust itself.

One of the biggest drawbacks of supervised machine learning is that we need this labeled data which can be very difficult to get hold of. Suppose we wish to predict heart attacks, we might need thousands of patients along with all of their filled in medical information and years worth of follow-up records for each person, which could be a nightmare to obtain.

In short, supervised models use historical labeled data in order to make predictions about the future. Some possible application for supervised learning include:

Stock price predictions
Weather predictions
Crime predictions

Supervised learning exploits the relationship between the predictors and response to make predictions, but sometimes it is enough just knowing that there even is a relationship. Suppose we are using a supervised learning model to predict whether or not a customer will purchase a given item. A possible dataset might look like this:

Person ID	Age	Gender	Employed	Bought the Product?
1	63	F	N	Y
2	24	M	Y	N

Note that in this case the predictors are Age,Gender and Employed while our response is "Bought the product?". This is because we want to see if, given someone's age,gender and employment status they will buy the product.

Assume that a model is trained on this data and can make accurate predictions about whether or not someone will buy something. That, in and of itself, is exciting but there's something else that is arguably even more exciting. The fact that we could make accurate predictions implies that there is a relationship between these variables, which means that to know if someone will buy your product, you only need to know their age, gender and employment status. This might contradict the previous market research indicating that much more must be known about a potential customer to make such a prediction.

This speaks to supervised learning's ability to understand which predictors affect the response and how. For example, are women most likely to buy the product, which age groups are prone to decline the product, is there a combination of age and gender that is a better predictor than any one column on its own. As someone's age increases, do their chances of buying the product go up,down or stay the same?

It is also possible that all the columns are not necessary. A possible output of a machine learning might suggest that only certain columns are necessary to make the prediction and that the other columns are only noise and they do not correlate to the response and therefore confuse the model.

Types of Supervised Learning

There are two types of supervised learning models: regression and classification. The difference between the two is quite simple and lies in the response variable.

Regression

Regression models attempt to predict a continuous response. This means that the response can take on a range of infinite values. Consider the following examples:

Dollar Amounts
- Salary
- Budget
Temperature
Time
- Generally recorded in seconds or minutes

Classification

Classification attempts to predict a categorical response, which means that the response only has a finite amount of choices. Examples include the ones given as follows:

Cancer grade (1,2,3,4,5)
True / False questions such as the following examples:
- "Will this person have a heart attack within a year?"
- "Will you get this job"
Given a photo of a face, who does this face belong to? (facial recognition)
Predict the year someone was born:
- Note that there are many possible answers (over 100) but still definitely many more

Supervised Learning - Example 2 - regression

The following graphs show a relationship between three categorical variables (age, year they were born and education level) and a person's wage:

Note that even though the predictor is categorical, this example is regressive because of the y-axis, our dependent variable, our response, is continuous.

Our earlier heart attack example is classification because the response was "will this person have a heart attack within a year?", which has only two possible answers: Yes or No.

Sometimes it can be tricky to decide whether or not you should use classification or regression. Consider that we are interested in the weather outside. We could ask the question, how hot is it outside? in which case your answer is on a continuous scale and some possible answers are 79 degrees or 98 degrees. However as an exercise if we go and ask 10 people what the temperature is outside most of them will not answer in some exact degrees but will bucket their answer and say something like it's in the 60s.

We might consider this as a classification problem, where the response variable is no longer in exact degrees but is in a bucket. There would only be a finite number of buckets in theory, making the model perhaps learn the differences between 60s and 70s a bit better.

Unsupervised Learning

Unsupervised learning does not deal with predictions but has a much more open objective. Unsupervised learning takes in a set of predictors and utilizes relationships between the predictors in order to accomplish tasks, such as the following:

Reducing the dimension of the data by condensing variables together. - An example of this would be file compression. Compression works by utilising patterns in the data and representing the data in a smaller format.
Finding groups of observations that behave similarly and grouping them together.

The first element on this list is called the dimension reduction and the second is called clustering. Both of these are examples of unsupervised learning because they do not attempt to find a relationship between predictors and a specific response and therefore are not used to make predictions of any kind. Unsupervised models, instead are utilized to find organizations and representations of the data that were previously unknown.

A big advantage for unsupervised learning is that it does not require labeled data, which means that it is much easier to get data that complies with unsupervised learning models. Of course, a drawback to this is that we lose all predictive power because the response variable holds the information to make predictions and without it our model will be hopeless in making any sort of predictions.

A big drawback is that it is difficult to see how well we are doing. In a regression or classification problem, we can easily tell how well our models are predicting by comparing our models answers to the actual answers. For example, if our supervised model predicts rain and it is sunny outside, the model was incorrect. If our supervised model predicts the price will go up by 1 dollar and it goes up by 99 cents, our model was very close! In supervised modeling, this concept is foreign because we have no answer to compare our models to. Unsupervised models are merely suggesting differences and similarities which then require a human's interpretation.

Popular techniques include self-organising maps, nearest-neighbour mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.

About 10 to 20 percent of machine learning is unsupervised learning, although this area is growing rapidly.

Reinforcement Learning

With reinforcement learning, the algorithm discovers for itself which actions yield the greatest rewards through trial and error. The algorithm then adjusts itself and modifies its strategy in order to accomplish some goal, which is usually to get more awards.

Reinforcement learning has three primary components:

The agent – the learner or decision maker.
The environment – everything the agent interacts with.
Actions – what the agent can do.

The objective is for the agent to choose actions that maximize the expected reward over a given period of time. The agent will reach the goal much quicker by following a good policy, so the goal in reinforcement learning is to learn the best policy.

This type of machine learning is very popular in AI-assisted game play as agents (the AI) are allowed to explore a virtual world and collect rewards and learn the best navigation techniques. This model is also popular in robotics especially in the field of self-automated machinery, including cars:

It can be thought that reinforcement is similar to supervised learning in that the agent is learning from its past actions to make better moves in the future; however, the main difference lies in the reward. The reward does not have to be tied in any way to a "correct" or "incorrect" decision. The reward simply encourages or discourages different actions.

Markov decision processes (MDPs) are popular models used in reinforcement learning.

Reinforcement learning is often used for robotics and navigation.

Overview of Types of Machine Learning

Each of the three types of machine learning has its benefits and also its drawbacks as listed:

Supervised Learning

This exploits relationships between predictors and response variables to make predictions of future data observations.

Pros
- It can make future predictions
- It can quantify relationships between predictors and response variables
- It can show us how variables affect each other and how much
Cons
- It requires labeled data which may not available in all cases or difficult to get

Unsupervised Learning

This finds similarties and differences between data points

Pros
- It can find groups of data points that behave similarly that a human would never have noted
- It can be a preprocessing step for supervised learning - Think of clustering a bunch of data points and then using these clusters as the reponse!
- It can use unlabeled data, which is much easier to find
Cons
- It has zero predictive power
- It can be hard to determine if we are on the right track
- It relies much more on human interpretation

Reinforcement Learning

This is reward-based learning that encourages agents to take particular actions in their environments.

Pros
- Very complicated rewards systems create very complicated AI systems
- It can learn in almost any environment including our own Earth
Cons
- The agent is erratic at first and makes many terrible choices before realizing that these choices have negative rewards. For example a car might crash into a wall and not know that that is not okay until the environment negatively rewards it.
- It can take a while before the agent avoids decisions altogether.
- The agent might play it safe and only choose one action and be "too afraid" to try anything else for fear of being punished

Machine Learning is n't perfect

There are many caveats of machine learning. Many are specific to different models being implemented, but there are some assumptions that are universal for any machine learning model, as follows:

Data used is for the most part is preprocessed and cleaned. Almost no machine learning model will tolerate dirty data with missing values or categorical values. Use dummy variables and filling / dropping techniques to handle these discrepancies
Each row of a cleaned dataset represents a single observation of the environment we are trying to model.
If our goal is to find relationships between variables then there is an assumption that there is some kind of relationship between these variables. This assumption is particularly important. Many machine learning models take these assumption seriously. These models are not able to communicate that there might not be a relationship.
Machine learning models are generally considered semiautomatic, which means that intelligent decisions by humans are still needed. The machine is very smart but has a hard time putting things into context. The output of most models are a series of numbers and metrics attempting to quantify how well the model did. It is up to a human to put these metrics into perspective and communicate the results to an audience.
Most Machine Learning models are sensitive to noisy data. This means that the models get confused when you include data that doesn't make sense. For example if you are attempting to find relationships between economic data around the world and one of your columns is puppy adoption rates in the capital city, that information is likely not to be relevant and will confuse the model.